Oriya Language Text Mining Using C5.0 Algorithm

نویسندگان

  • Sohag Sundar Nanda
  • Soumya Mishra
  • Sanghamitra Mohanty
چکیده

Text Mining is essential for knowledge discovery from valuable texts available in many forms. These texts carry relevant information pertaining to the need of the user. In this paper we describe a tourist decision support system that mines data regarding tourist places in Orissa from Oriya text files, translates and preprocesses data and classifies the tourist places into three classes using C 5.0 algorithm. The result obtained is then used to help international tourists in selecting places of interest according to their choice. Oriya Language is the official language of Orissa, a state in the eastern part of India. More than 31 million people speak and write this language. It has a rich heritage and culture and knowledge is stored in many forms through Oriya language text. We also present a sketch of our ongoing and future work on the same tourism datasets using field force automation and opinion mining techniques. Keywords— Text Mining, Decision Support System, Classification, C 5.0, machine translation.Introduction

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Data Mining Algorithms for Detection of Liver Disease

Background and Aim: The liver, as one of the largest internal organs in the body, is responsible for many vital functions including purifying and purifying blood, regulating the body's hormones, preserving glucose, and the body. Therefore, disruptions in the functioning of these problems will sometimes be irreparable. Early prediction of these diseases will help their early and effective treatm...

متن کامل

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Pollutant forecasting is an important problem in the environmental sciences. Data mining is an approach to discover knowledge from large data. This paper tries to use data mining methods to forecast concentration level, which is an important air pollutant. There are several tree-based classification algorithms available in data mining, such as CART, C4.5, Random Forest (RF) and C5.0. RF and C5....

متن کامل

Text Mining Technique for Data Mining Application

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to ...

متن کامل

Application of text mining for customer evaluations in commercial banking

Nowadays customer attrition is increasingly serious in commercial banks. To combat this problem roundly, mining customer evaluation texts is as important as mining customer structured data. In order to extract hidden information from customer evaluations, Textual Feature Selection, Classification and Association Rule Mining are necessary techniques. This paper presents all three techniques by u...

متن کامل

Automating XML Markup using Machine Learning Techniques

In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algori...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011